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Abstract. We give the asymptotic distribution of the length of partial coalescent trees 
for Beta and related coalescents. This allows us to give the asymptotic distribution of the 
number of (neutral) mutations in the partial tree. This is a first step to study the asymptotic 
distribution of a natural estimator of DNA mutation rate for species with large families. 



1. Introduction 



1.1. Motivations. The Kingman coalescent, see |15[I16]. allows to describe the genealogy of 
n individuals in a Wright-Fisher model, when the size of the whole population is very large 
and time is well rescaled. In what follows, we consider only neutral mutations and the infinite 

\ allele model, where each mutation gives a new allele. The Watterson estimator [22], based 

' on the number of different alleles observed among n individuals alive today, K^'^\ allows to 

estimate the rate of mutation for the DNA, 9. This estimator is consistent and converges at 
rate 1 / ^J\og{n). 

Other models of population where one individual can produce a large number of children 
^ \ give rise to more general coalescent processes than the Kingman coalescent, where multiple 

' collisions appear, see Sagitov [20] and Schweinsberg [21] (such models may be relevant for 

(Nl ■ oysters and some fish species [71 dH])- In Birkner and al. [5] and in Schweinsberg [2T| a 

\ natural family of one parameter coalescent processes arise to describe the genealogy of such 

' populations: the Beta coalescent with parameter a G (1,2). Results from Beresticky and al. 

. ^ give a consistent estimator, based on the observed number, K^^\ of different alleles for 

I the rate 6 of mutation of DNA. This paper is a first step to study the convergence rate of 

this estimator or equivalently to the study the asymptotic distribution of K^^^ . Results are 
also known for the asymptotic distribution of K^""^ for other coalescent processes, see Drmota 
X ; and al. ^ and Mohle [iTj. 

■ For the Beta coalescent, the asymptotic distribution of K^^^ depends on 9 but also on 

the parameter a. In particular, if the mutation rate of the DNA is known, the asymptotic 
distribution of X^") allows to deduce an estimation and a confidence interval for a, which in 
a sense characterize the size of a typical family according to [21j. 

1.2. The coalescent tree and mutation rate. We consider at time t = a number, n > 1 
of individuals, and we look backward in time. Let Vn be the set of partitions of {1, . . . ,n}. 

(n) (n) 

For t > 0, let 11^ be an element of Vn such that each block of IIJ corresponds to the 
initial individuals which have a common ancestor at time —t. We assume that if we consider 
b blocks, k of them merge into 1 at rate Xb^k^ independent of the current number of blocks. 
Using this property and the compatibility relation implied when one consider a larger number 
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of initial individuals, Pitman 19l, see also Sagitov [20] for a more biological approach, showed 
the transition rates are given by 



\bk= [ x''-^{l-x)^~''A{dx), 2<k<b, 



'(0,1) 

for some finite measure A on [0, 1], and that n'-"^ is the restriction of the so-called coalescent 
process defined on the set of partitions of N*. The Kingman coalescent correspond to the 
case where A is the Dirac mass at 0, see [15]. In particular, in the Kingman coalescent, only 
two blocks merge at a time. The Bolthausen-Sznitman j6] coalescent correspond to the case 
where A is the Lebesgue measure on [0,1]. The Beta-coalescent introduced in Birkner and 
al. [5] and in Schweinsberg [21], see also Bertoin and Le Gall [4J and Beresticky and al. [1] , 
corresponds to A{dx) = Cox°'~^{l — x)^~"l(o,i)(x) dx for some constant Co > 0. 

Notice = (nj"\t > 0) is a Markov process starting at the trivial partition of {1, . . . , n} 

(n) (n) 

into n singletons. We denote by Rl the number of blocks of IIJ , that is the number of 

(n) 

common ancestors alive at time —t. In particular we have Rq = n. We shall omit the 
superscript (n) when there is no confusion. The process R = {Rt,t > 0) is a continuous time 
Markov process taking values in N*. The number of possible choices of £ + 1 blocks among 
^ is (^+1) (for 1 < £ < k — I) and each group of ^ + 1 blocks merge at rate A^^^+i. So the 
waiting time of R in state k is an exponential random variable with parameter 

w = E G^) = (1 - (1 - (^.-d -)'-)^ 

and is distributed as E/g^, where E is an exponential random variable with mean 1. 

The apparition time of the most recent common ancestor (MRCA) is T„ = inf {t > 0; -Rf = 

!}• 

Let Y = (Yfc, A; > 1) be the different states of the process R. It is defined by Yq = Rq and 
for A; > 1, Yfc = Rs^-, where the sequence of jumping time {Sk,k > 0) is defined inductively 
by 5*0 = and for k > 1, Sk = inf{t > Sk^i;Rt 7^ -^5^-1}- We use the convention that 
inf = +00 and Yk = 1 for k > t„, where t„ = inf{A;; Rs^. = 1} is the number of jumps of the 
process R until it reach the absorbing state 1. The number r„ is the number of coalescences. 

We shall write instead of Y when it will be convenient to stress that Y starts at time 
at point n. Notice Y is an N*-valued discrete time Markov chain, with probability transition 

(2) P{k,k-£) = ^^i^^^. 

gk 

The sum of the lengths of all branches in the coalescent tree until the MRCA is distributed 



as 



1 vin) 



Lin) ^ i^E^^ 

where {Ek,k > 0) are independent exponential random variables with expectation 1. 

In the infinite allele model, one assume that (neutral) mutations appear in the genealogy 
at random with rate 0. In particular by looking at the number i^(") of different alleles among 
n individuals, one get the number of mutations which occured in the genealogy of those 
individuals after the most recent common ancestor. In particular, conditionally on the length 
of the coalescent tree L^"^ the number K^^^ of mutations is distributed according to a Poisson 



; Ej. = — En-k-i- The r.v. L("V2 is distributed as the sum of 

^-^ n — k — 1 ^-^ k 
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r.v. with parameter OL^"^' . Therefore, we have that ^ — converges in distribution 

to a standard Gaussian r.v. (with mean and variance 1). If the asymptotic distribution of 
^("•) is known, one can deduce the asymptotic distribution of K^"'h 

1.3. Known results. 

1.3.1. Kingman coalescence. . For Kingman coalescence, a coalescence corresponds to the 
apparition of a common ancestor of only two individuals. In particular, we have for < A; < 
n — 1, yi"^ = n — k. Thus we get r„ = n — 1 as well as g^(n) = {n — k){n — k — l)/2. We also 

have 

2 .„ .„ _ 

fe=0 fc=l 

independent exponential r.v. with parameter 1 to n — 1, that is as the maximum on n — 1 
independent exponential r.v. with mean 1, see Feller [IT] section 1.6. An easy computation 

gives that LW/(21og(n)) converges in probability to 1 and that — ^ log(n) converges in 

distribution to the Gumbel distribution (with density g-^-^^^P-^) when n goes to infinity. 

It is then easy to deduce that , — converges in distribution to the standard 

Gaussian distribution. This provides the weak convergence and the asymptotic normality of 

the Watterson |22i estimator of 9: — ; — ttt = r-r- See also the appendix in [9]. 

1.3.2. Bolthausen-Sznitman coalescence. In Drmota and al. [9], the authors consider the 
Bolthausen-Sznitman coalescence: A is the Lebesgue measure on [0, 1] . In this case they prove 

that — log(n)L^"'^ converges in probability to 1 and that ; converges in distribution 

n On 

to a stable r.v. Z with Laplace transform E[e^'^'^] = e''*'°^^'^^ for A > 0, where 



n nlog(log(n)) n 
H --, — 1—^ — and ^ 



log(n) log(n)2 log(n)2 

It IS then easy to deduce that — converges to Z. 

Obn 

1.3.3. The case J^q x~^A{dx) < oo. In Mohle [17], the author investigates the case where 

x~^A{dx) is a finite measure and consider directly the asymptotic distribution of K^^\ In 
particular he gets that K^"'^ /n9 converges in distribution to a non-negative r.v. Z uniquely 
determined by its moments: for A: > 1, 

nz']= .^' ^ , with Hi)= f {l-{l-xy)x-^A{dx). 

nti Hi) Jm 

There is an equation in law for Z when A is a simple measure, that is when Jj^ x~'^A{dx) < 
oo. 
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1.3.4. Beta coalescent. The Beta-coalescent correspond to the case where A is the Beta(2 — 

a, a) distribution, with a G (1,2): A(dx) = — , x^~°(l — x)°'~^dx. The King- 

r(2 — a)T{a) 

man coalescent can be viewed as the asymptotic case a = 2 and the Bolthausen-Sznitman 
coalescence as the asymptotic case a = 1. 

The first order asymptotic behavior of L^") is given in [2], theorem 1.9: n^-^L^") converges 

in probability to — — ^ — ^- ^. We shall now investigate the asymptotic distribution of L^"). 

2 — a 

1.4. Main result. In this paper we shall state a partial result concerning the asymptotic 
distribution of L*-"-*. We shall only give the asymptotic distribution of the total length of the 
coalescent tree up to the [ntj-th coalescence: 

LntjA(r„-l) („) 

(3) lI-^= E 

where [x\ is the largest integer smaller or equal to x for x > 0. 

We say g = 0{f), where / is a non- negative function and g a real valued function defined 
on a set E (mainly here E = [0, 1] or E = N* or E = N* x [0, 1]), if there exists a finite 
constant C > such that \g{x)\ < Cf{x) for all x & E. 

Let i^{dx) = x-'^A{dx) and p{t) = u{{t, 1]). We assume that p{t) = Cq^"" + 0(t""+'^) for 
some a £ (1,2), Co > and ( > 1 — 1/a. This includes the Beta(2 — a, a) distribution for 
A. We have, see Lemma 12.2^ that 

gn = CoT{2 - a)n° + 0(n"-™''('^'i)). 

Let 7 = a — 1. Let V = {Vt, f > 0) be a a-stable Levy process with no positive jumps (see 
chap. VII in [3]) with Laplace exponent ip{u) = u'^/'y: for all u > 0, E[e^"^*] = e*""/'*'. 
We first give in Proposition 13.11 the asymptotic for the number of coalescences, r„: 

^-f ('^^ T/ 
n Q I n > v^. 

7 J n^oo 



See also Gnedin and Yakubovich [12] and Iksanov and Mohle [I3j for different proofs of this 
results under slightly different or stronger hypothesis. Then we give the asymptotics of lI"^ 
defined as Cor(2 — a)L^"^ but for the exponential r.v. Ei^ which are replaced by their mean 



(") 



that is 1 and for g {n) which is replaced by its equivalent Cor(2 — a) ( Y"^ 

k 

LntjA(r„-l) 
A;=0 

For t G [0, 7], we set 

Theorem 15.11 gives that the following convergence in distribution holds for all t G (0,7) 
(5) n-i+"~i/"(L^") -n2-°t,(t)) (a - 1) / dr (1 - -)-"K. 

n^oo 7o 7 
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Then we deduce our main result, Theorem 16. li Let a € (1, — ^ ^ ^ (OiT)) 
we have the following convergence in distribution 

V CoT{2-a)J n-*co CoT{2- a) Jq 7 



We also have that re" "^Li" converges in probability to for a G (l^S). For 



C7or(2 - a) 

t = 7, intuitively we have L^"^ close to L^") as r„ is close to re/7. In particular, one expects 

that re"-2i(") converges in probability to ^^r-^^^ — r. For the Beta-coalescent, K{dx) = 

CqT(2 - a) 

— — — — — we have Cq = l/ar(2 — Q)r(a) and indeed, theorem 1.9 

r(2 — a)T{a) 

in [2] gives that re^^^L*^"^ converges in probability to — — ^ — ^ ^ = t^/^'^ r- Notice 

2 — a CqT{2 — a) 

theorem 1.9 in [2j is stated for more general coalescents than the Beta-coalescent. 

In Corollary 16.21 we give the asymptotic distribution of the number K^^^ of mutations on 

the coalescent tree up to the [retj-th coalescent for a E (1, 2). In particular, for a > — ~^~~~) 
the approximations of the exponential r.v. by their mean are more important than the 
fluctuations of L^"^ , and the asymptotic distribution is gaussian. 



1.5. Organization of the paper. In Section [5] we give estimates (distribution, Laplace 
transform) for the number of collisions in the first coalescence in a population of re individuals. 
We prove the asymptotic distribution of the number of collisions, r„, in Section [31 as well as 
an invariance principle for the coalescent process see Corollary 13.51 In Section [H we 

give error bounds on the approximation of L^"^ by L|"''*/Cor(2 — a). Section [5] is devoted 
to the asymptotic distribution of Eventually, our main result. Theorem 16.11 on the 

asymptotic distribution of , and Corollary 16.21 on the asymptotic distribution of the 

number of mutations k[^\ and their proofs are given in Section [6j 

In what follows, c is a non important constant which value may vary from line to line. 



2. Law of the first jump 
Let y be a discrete time Markov chain on N* with transition kernel P given by ([2|) and 

(n) 

started at Yq = Let = Yfc_i — 1^ for k>l. We give some estimates on the moment 

of xj"'* and its Laplace transform. 
For re > 1, X G (0,1), let B 

n^x be 8' binomial r.v. with parameter (re,a;). Recall that for 

1 < A; < re, we have 
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Recall that ^{dx) = x~^A(dx) and p{t) = i^{{t, 1]). Use the first equality in ([1]) and ([7]) to get 



9n = f^Y.(^^x\l-xT~^u{dx) 



k=2 

1 

nBn,x > 2)u[dx) 



(8) = n{n -I) f (1 - tY-hp{t) dt. 

Jo 

1 /"^ 

Notice also that P(xj"^ = k) = P{n, n - k) = — I W{Bn,x = k + l)u{dx) and thus 

dn Jo 

(9) Pfx(") >k) = IonBn,.>k + lMdx) ^ (n-2)! f^\l - tr-'-H' p{t) dt 

' " ^ gn kl{n-k-iy. j^{l-tY~Hp{t)dt 

Let a G (1, 2) and 7 = a — 1. 

We say g = o{f), where / is a non-negative function and g a real valued function defined 
on (0, 1], if for any e > 0, there exists xq > s.t. \g{x)\ < ef{x) for all x G (0, xq]- 

Lemma 2.1. Assume that p{t) = CQt'"" + o{t-°') . Then {x'f'\n> 2) converges in distribu- 
tion to the r.v. X such that for all k > 1, 

We have K[X] = I/7, E[X^] = +00 and its Laplace transform (j) is given by: for u > 0, 

^{u) = E[e-"^] = 1 + [(1 - e-")"-^ - ll . 

a — 1 

We shall use repeatedly the identity of the beta distribution: for a > and 6 > 0, we have 

Jo 



(10) 



r(a + b) 



Proof. The condition p{t) = Cot ° + o{t ") implies that for fixed A; > 1, as n goes to infinity, 
we have 

V - ty-'Mt) it = r(^ + i-°)r(~-^ ^ 

L [n + 1 — a) 
Therefore, we get that 



^ _ ^^'-^y- Jo 



2)! jQ{l-t)''-''-H''p{t) dt 



lim F{Xf' > k)= lim — ^ — , 

n^oo n^oc kl{n - k - ly. _ i)n-2ip(i) 

(n-2)! T{k + l-a)T{n-k) r(n + l-a) 



lim 

n^oo k\{n - k - ly. r(n + l-a) r(2 - a)r(n - 1) 



1 T{k + l-a) 



r(2-a) kl 
This ends the first part of the Lemma. Notice that 



1 

!>(X > A:) = — — r / t^'-"(l - t)"-^(it 

^ - ^ r(a)r(2-Q)yo 
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and as F{X = k) = V{X > k) - F{X >k + l), we get 

(11) nx = k) = -— i — , t^-(i - tTdt - " nk+i-a) 



T{a)T{2-a) Jq ' ' T{2-a) (fc + l)! 

We have 

1 r(2-a)r(a-l) 

~ r(a)r(2 -q) f(i) 
1 



a — 1 

The asymptotic expansion 



(12) 



imphes F(X = k) ■ — Therefore we have E[X ] = +oo. We compute the 

r(2 — a) 

Laplace transform of X. Let u > 0, we have 



r(2-a)^(fc + l)! 



oo 



e '^'^ ^ °e ^ 



a e 



M /-oo 



a;"i-"e-^(e^^ "-xe-^-l) dx 

r(2 - a) Jq 

/•oo 

where we used ([TT]) with r(/c + 1 — a) = / x'^"" e~^ for the first equality and two 

Jo 

integrations by parts for the last. □ 
We give bounds on gn. 

Lemma 2.2. Assume that p{t) = Cot'" + 0(t-"+9 for some Co > and > 0. Then we 
have, for n > 2, 

(13) gn = CoT{2 - a)n° + ©(n"-™'^^^'^)). 

Proof. Notice that 

' n-2jr..-^ , ^f.-a+C^\ r.^,^ ^ , r(2 - a)r(n - 1) 



<7„ = n(n-l) / (l-t)"-2t(Cor" + 0(t-"+^)) dt = C^n(n - ^ h^. 

Jo ^ / r(n + l-Q) 
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where hn = n{n — 1) / (1 — t)"~^t~"^^^^0(l) dt. In particular, using (fT2|) . we have for 



n>2 

\K\ < cn{n - 1) /V - tr-^t--W = ,^(^ _ i)£(iz:^±0£(^ < 
Jo r(n + l-a + C) 

Using (fT2|) again, we get that r(n — 1)/T{n + 1 — q) = n"^^ + 0(n"~^). This imphes that 

gn = Cor(2 - + 0(n"^^"("'^'"-^)). 



□ 



We give an expansion of the first moment of 

Lemma 2.3. Assume that p{t) = Cot~° + 0(t""+9 for some Co > and C > 0. Let eq > 0. 
We set 



(14) 



n 



n 



l—a+eo 



if C < a - 1, 
if C = a - 1, 



n""" if ( > a-1. 

There exists a constant C^YSl n > 2, we have 



(15) 

Proof. We have 



(")i 



< 



E[xi")] = ^P(xj"^ > A:) 



(n) 



fc>i 



(16) 



IoEk>inBn,.>k + lHdx) 

gn 

f^{E[Bn,cc]-nBn,x > l)Hdx) 

gn 

Jq nxv{dx) — Jq{1 — (1 — x)^)i'{dx) 



9n 



(17) 



n 



- [I - tr~^]p[t) dt 



gn 



!i{l-tr~^[jlp{r)dr) dt 



j',{l-tr-Hp{t)dt ' 
using do]) for the first equality and ([8]) for the last. Notice that 

p{r) dr = - tp{t) + 0(1) + C 0(r-°+^) dr + 0(t-"+'^+^) 
Jt 7 Jt 

= i tp{t) + O(r-(-+^+i'0)) + 0(| \og{tm{a-c=i} 

= - tp{t) + o(r''^(-"+^+i'°)) + o(t-^«)i{a-c=i}- 



This implies that 



^[^(n)j^ l^n(n-l) 



7 gn Jo 



n-2 / ^/j.min(-Q+C+l, 



(1-t)"-" 0(t 



°))+0(t-^o)l|„_^=l|) dt. 
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Using ([10]), ([I2D and Lemma [221 we get 



E[Xl 



(«)i 



< c- 



dn Jo ^ ' 



dt 



□ 



We give an upper bound for the second moment of 



(n) 



Lemma 2.4. Assume that p{t) = 0{t Then there exists a constant G^f^ s.t. for all 

n > 2, we have 



(18) 



E 



X 



(n) 



Proof. Using the identity E[y^] = ^jt>i(2A; — l)P(y > k) for N- valued random variables, we 
get 



E 



X 



(n) 



/o Efe>i(2fc - l)IP(i?«,x. > + iHdx) 

lo (Efc>i(2(^ + 1) - mSn,. >k + l) -2Zk>inBn,. >k + l)) u{dx) 

9n 

(E[i?2j _ 2E[i?„,J +F(B„,,. > 1)) 



/;(E[i?^J-E[i?„,j)i^(dx) 

9n 

2n(n-l)i^M!)^_E[xW] 



■E[Xi 



5n 



where we have used ([T6]) for the fourth equality. Use dt < oo and E[xj"^] > to 

conclude. □ 



We consider 0„ the Laplace transform of xj"'': for u > 0, (pn{u) = E[e "■''"i 

Lemma 2.5. Assume that p{t) = Cot-'' + 0{t"''+<) for some Co > and C > 0. Let sq > 0. 
Recall ifn given by |j^[ ). T/ien we /iat;e, /or n >2, 



(19) 



,(m) = 1 - - + _ +i?(n,u), 

7 7 



where R{n,u) = [uipn + u^) h{n,u) with sup^g[o,_R'],n>2 < 



CXD. 
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Proof. We have 



4>n{u) 

n-1 

r ,,(n)l 

E 

fe=i 

n-l 



fe=l 

L n 

J^e-"^P(xi"^ > A:) -5^e-"(^-i)p(X;") > A;) 



k=l k=2 

n-l 

e"" + e-"'=(l - e")P(xi"^ > A;) 

fc=2 



+(1 _ e-) ^ ^ / ^, t^(l - tr-^-'pit) dt 

5n 7o k\{n-k-l)\ 



k=2 

-u _ gn)^ /"^^(^L _ ^(-^ _ e-«))«-i_ (1 _ (n _ 1) e"" t(l - t)"-^] 

5n JO 



= 1 + (1 - e'')— [(1 - t{l - e-"))"-i - (1 - tr~'] p{t) dt, 

9n Jo 

where we used ([5]) for the last equaUty. Using (flTj) . this impHes 



(20) 0„(u) = l + (l-e")— A + (l-e")E[Xj 

9n 



with ^ = / [(1 - t(l - e-"))"-i - 1] dt. 
Jo 

Thanks to Lemma 12. 3[ we have that 



(21) (1 - e")E[xi")] = + (u2 + uipn) h{n, u), 

where sup„g[o,i^],„>2 \hi{n,u)\ < oo. 

To compute A, we set a = (1 - e"") and f{t) = max{a-i-c,o) _^ ^-eoi^^_^^_^^_ 
integration by part gives 



A = -a{n - 1) ^ (1 - at)""^ (^^ /)(r) dr^ 



dt 



1 /^l-a 



-a(n - l)Co (1 - at)""2 (^1— + 0(/(t)) ) dt 
-A1+A2, 
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with Ai = ~ Co C{l-aty'-H^-'' dt<MidA2 = a{n-l) [\l - at)''-^0{f{t)) dt. We 
7 Jo Jo 



have 



A, = °° '^'^ ^^ Co I (1 - tr-H'-'^ dt 



7 Jo 

"""'^^~^^ Co j\i - tr~h^-^ dt - "°~'^"~^^ Co j^V - dt 



7 r(n + 1 — a) 7 

Since a > 0, we have for u G [0, ii'] and n > 2 

< i / (1 - 1) t < / (1 - < - . 

Using (fni) and LemmaE^l we get l^i - ^^^1 < c(l + n"-^-'"'"^^'^)) < cn^ax(a-i~c,o)^ 

7 n 

where c does not depend on n and u > 0. We also have, using ([10]) and (fT2]) 



IA2I < ca(n - 1) / (1 - at)'^-'^f{t) dt < c(n™^^("-i-?'0) + n^oi|„_^=i}). 
JO 

We deduce, using Lemma YX7I\ twice, that 

1^ + < ,(^max(a-l-C,0) ^ n^O. . < ^9n^^^ 

7 n n 

We deduce that 

(22) (1 - e")-A = (1 - e") ^ + (^„0(1) = — + + u^n) /^2(n,n), 

5n V 7 / 7 

where supug[Q^^]^„>2 |/i2('T'; ^)| < cxd. Then use the expression of given by ([20|) as well as 
(PT|) and to end the proof. □ 

3. ASYMPTOTICS FOR THE NUMBER OF JUMPS 

Let a £ (1, 2). We assume that p{t) = Cot''' + 0{t-°'+^) for some Co > and C > 1 - 1/a. 

Let V = {Vt,t > 0) be a a-stable Levy process with no positive jumps (see chap. VII in 
[3]) with Laplace exponent ip{u) = u^'/r- for all n > 0, E[e-''^'] = e*"°/^. 

Lemma 12.11 implies that {x[^\ . . . , xj^^) converges in distribution to (-'^i, . . . , X^) where 
(Xk, A; > 1) is a sequence of independent random variables distributed as X. Using Lemma[2T] 
and (fT2]) . we get that P(X > k) ~+oo r{2-a) Hence Proposition 9.39 in [8j implies that 

(n) 

the law of X is in the domain of attraction of the a-stable distribution. We set = 



\nt\ ^ 

n 'S^{Xk ) for t G [0, 7]. An easy calculation using the Laplace transform of X shows 



(n) 

that for fixed t the sequence converges in distribution to Vt- Then using Theorem 16.14 
in [U], we get that the process G [0,7]) converges in distribution to V = {Vt,t G 

[0,7]). We shall give in Corollary [33] a similar result with Xk replaced by xj^\ 
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We first give a proof of the convergence of r„, see also |12j and [9] for a different proof. 

Tn 

We will use that ^^(^i"^ - -) = n - 1 - — . 

i=i ^ ^ 

Proposition 3.1. We assume that Q > 1 — 1/a. We have the following convergence is 
distribution 

n a n > VCy. 

Proof. Using [18], it is enough to prove that lim E[e t )] = e" for all u > 0. Let 

n^oo 

3^ = (3^fci A: > 0) be the filtration generated by Y. Notice r„ is an 3^-stopping time. For fixed 
n, and for any f > 0, the process {M^^k, k > 0) defined by 

k 



i=l ^ ^ 



is a bounded martingale w.r.t. the filtration y. Notice that E[My^fc] = 1. As Xj = for 
i > Tn, we also have 

(23) 



M,,fc = n ( -^^Xf ^ - log (t>yir.) {V) ) 



Let n > and consider a non-negative sequence {an,n > 1) which converges to 0. Using 
19]) . we get that : 

fcAr„ kAT„ 



(K/vin ti/vin / a a \ 

-ua^ ^ ) - ( + ^ + RivS , ua^) ) 

i=i i=i V 7 7 / 

In particular, we have 



(24) M„,„,.„ = exp -uan{n - 1 - ^) - - J] 



.(n) 



We first give an upper bound for Y^iLx R(X^-i,uan)- 

Lemma 3.2. We assume that ( > 1 — 1/a. Let K > {). Let rj > There exist ei > and 
C^25i^) finite constant such that for all n > 1 and u G [0, K], a.s. with an = n~'^ , 

(25) ^«n)| < Q^K)n~^\ 

i=l 

Proof. Notice that < n — \. We have seen in Lemma [2. 5 1 that ii(n, u) = {uipn + u^) /i(n, u) 

with h{K) = sup„g[Q^^] „>2 |/i(n, n)| < oo and (pn given by ([H]). We have 2 — a = 

— a(l — 1/a)^ < 0. As eo > is arbitrary in (I14p . we can take Eq small enough so that 
1 — a + eo < and 2 — a + £Q — l/a<Q. We have 



Tn 

a. 



^ a if^<a — 1 



i=l J=l 



if C = a-1, 



n^-"-^ if C>a-1- 
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For ei > less than the two positive quantities — 1 + C H and —2 + a — eo-\ — , we have 

a a 

Tn 



On V^V9^(n) < cn . We deduce that, for u £ [0,K], 

i — 1 



1=1 



Tn Tn / \ 

i=l 1=1 ^ '"^ ^ 

n 



< h{K) {'PjKan + {Kan?) 
i=i 

2 

for some constant c independent of n, it and K. Taking ei > smah enough so that ei < 1, 

a 

we then get ([25]). □ 

Next we prove the following Lemma. 

Lemma 3.3. We assume that (> 1—1/ a. Lete>0. The sequence {n"^^/'^^~^ (n—l——),n > 
1) converges in probability to 0. 

Proof. We set a„ = n" . Notice that 

e ' — J-V-l-uan,Tn ' 

As r„ < n — 1, we have < r„a" < n""*^. Using ([25]) . we get for u > 

E[M„„„,.Je-»)"'^^ <E[e-"'^"("-^-'f)] <E[Af„,„,.Je»)"'^" 

As Tn is bounded, the stopping time theorem gives E[M„a^^T-,J = 1- We deduce that, for all 
n > 0, lim E[e ^ = l. Using [18], we get the convergence in law of an{n — 1 — — ) 

n— >oo T 

to 0, and then in probability as the limit is constant. 

Let a„ = Q and n > 0. We have 
(26) E 



□ 



with Ii = E 
Using 



Ee ^Ml — e 

1 - e 



and ¥.[Muan,Tj = 1, we get 



+ E 



and /2 = E 



This implies that lim I2 = . 

We now prove that lim Ii = 0. Recall that t„ < n — 1 so that r„a" ^ 1 and thanks to 

n— +00 

(!25|). we get 



< M(n)E[M„,„,,J = Miu), 
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where M{u) is a constant which does not depend on n. By Cauchy-Schwarz' inequahty, we 
get that 



Ii=E 



< E 



e ^ -< ' 



E 



1 -u'^a^i^-n) 



< M(2'u)E 



Notice — > 1) is bounded from below and above by finite constants, and thanks 

to Lemma [fel it converges to in probabifity. Hence, we deduce that 



hm E 

n^oo 



0. 



This imphes that Um 1\ = 0. 



From the convergence of Ii and /21 we deduce from (j26p that hm E 

ra— >oo 

This ends the proof of the Proposition. 
We now give a general result. 

Proposition 3.4. We assume that C > 1 — 1/a. Let fn : —>■ 

functions such that 

At = hm - V fn{k/nr 

n—foc n '—^ 



□ 



he uniformly hounded 



k=l 



exists. Then we have the following convergence in distribution 

Tn 1/1 

(27) := n~^^fnik/n)iX^ - -) ^ 



fc=i 



7 



/ri particular, if f : 



is a bounded locally Riemann integrahle function, then 



(28) 



yW(/)=n-^x;/(^/^)(^fc--) ^ r 



f{t)dVt, 



where the distribution of JJ f(t)dVt is characterized hy its Laplace transform: for u > 0, 
(29) E[exp(-n £ fit)dVt)] = exp (^^ £ r{t) dt^ . 

If we apply this Proposition with step functions, we deduce the following result. 

Corollary 3.5. We assume that ( > 1-1/ a. Let f/"^ = y(")(l[o,j]) = n"!/" Et=J^^"(^fc"^ 
^) fort G [0,7), and V^^"^ = V^^\l) = n^^^°^ — 1 — . The finite- dimensional marginc 
of the process {v}"'\t £ [0,7]) converges in law to those of the process {Vt,t £ [0,7]). 
Proof. Thanks to [18j, it is enough to prove that 



E[exp(-ny (")(/„))] 



e 



Taking ufn as fn, we shall only consider the case u = 1. 
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We set a = sup„>]^ and for any bounded function g, 

Ar^ig) = exp f -n-i/"<7(fc/n)4") - log ,^ („) (n" ^g{k/n))) . 

k=i V / 

A martingale argument provides that E[A„(g)] = 1. Using (|19|) . we get that : 

^1 g'^{k/n) _ 

k=l 



15 



\ k=l ^ k=l ^ k=l J 



fc=i 



7 



k=\ 



Let A„ = X; _ „-i ^ „He 



fc=l 



fc=l 



E e 



7 



with Ii = E [e-^'"'(^") (1 - e^") 



and /2 = E 



= Ii + /2 

e-V'("'(/n)e^n 



First of all, let us prove that l\ converges to when n tends to oo. Recall that the functions 
/„ are uniformly bounded by a. Thanks to (j25p . we have 



A-ni'^fn) e ' 



-1 ^ ?lfSStM+J2l"^^ R{Y^%n~^2Uk)) 



< M, 



where M is a finite constant which does not depend on n. By Cauchy-Schwarz' inequality, 
we get that 



(/i)'< E 



e-^'"'(/n)|i_eA„| 


) <E 


g-y(")(2/„)" 


E 


"(1-e^")'" 


< ME 


"(1-e^")'" 

















Moreover as |1 — e^ I < e'^' —1 and A„ < — I \ n^\ — t^I, we get 

n7 



(30) 



E 



(1-e^")' 



< E 



[n7j -Tn |a° 

1 — e "T 



I I J — I 

The quantity is bounded and goes to in probability when n goes to infinity. 

Therefore, the right-hand side of (j30p converges to 0. This implies that lim„^oo Ii = 0. 
Let us now consider the convergence of l2- Remark that 



h = E 



Anifn) e 



Recall that /„ is bounded by a and that E[^„(/„)] = 1. Using Lemma 13.2^ we get for some 
e > 



(31) e H25f°^" " ^k=i J 



< E 



Anifn) e 
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As lim — f^(k/n) = k, we get that lim„_^oo h = e'^^"' , which achieves the proof of (l2^ 

n— >oo n ^ — ^ 



k=l 



1 L"tJ p 
To get ([28]), notice that k= hm - V f{k/nY = / f{t) 



dt. 



□ 



4. First approximation of the length of the coalescent tree 

Let a £ (1, 2). We assume that p{t) = Cot~" + 0(t-°+'^) for some Co > and C > 1 - l/a- 
Recah that the length of the coalescent tree up to the [ntj-th coalescence is, for t > 0, 

given by Q. The next Lemma gives an upper bound on the error when one replaces the 

exponential random variables by their mean. 



Lemma 4.1. For t >0, let 



(n) 



LntjA{T„-l) („) 
-I I, 



E 



There exists a finite constant such that for all t >0, we have 



(32) 



E 



(^(n) _ ^n)^2 



< 



f^3-2a if a < 3/2, 

log(n) if a = 3/2, 

1 if a > 3/2. 

(n) 



Y 

Proof. Conditionally on 3^, the random variables — — {Ek — 1) are independent with zero 

' k 

mean. We deduce that 



E 



E 



LntjA(T„-l) (n) 

fro 



13^ 



LntjA(T„-l) / (n) \ 2 



k=0 



1- 



e=i 

Thanks to (fT3]) . we get 

n 



E 



if a < 3/2, 
log(n) if a = 3/2, 
1 if a > 3/2, 



where c is non random. This implies the result. 
Lemma 4.2. For t >0, let 



□ 



lnt\A{Tn-l) 

k=0 
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There exists a finite constant such that for all t >0, we have 



(33) 



\L 



(n) 



L 



(n) 



CoT{2 - a) ' 



< 



( n2-°-C ifC<2-a, 
log(n) if ( = 2- a, 
1 ifC>2-a. 



Proof Use ^ to get that 



L 



(n) 



f W 

* C7or(2 - a) 



LntjA(r„-l) 



k=0 



Y,, 



(n) 



-7 



1^ 



-min(C,l) 



We deduce that 



in) 



L 



(n) 



CoT{2 - a) 



if^<2-a, 
log(n) if C = 2 — a, 
1 ifC>2-Q. 



□ 



5. Limit distribution of L 
Let a £ (1, 2) and 7 = a — 1. For t G [0, 7], we set 



in) 



Theorem 5.1. We assume that p{t) = Cot^° + 0(t^"+^) for some Cq > and > 1 — l/a. 
Then for all t £ (0,7), we have that 

(1) The following convergence in probability holds: 



(34) 



2+a J in) 



v{t). 



(2) The following convergence in distribution holds: 
(35) n-i+"-i/"(Lj") -n2-°7;(t)) (a - 1) / ^ dr (1 - -)-"K,. 



r , 

'0 7' 

Proof of Theorem \5.1[ Let £2 G (0)7) and t G (0,7 — £2)- We use a Taylor expansion to get 



L 



k=0 \ i=l 

LntjA(T„-l) / 



k \ 



1=1 

!-E(4"'-i 



7 



7 



(36) 



k=0 \ 

LntjA(r„-l) , h\~'"' 

E (n-^j (1-A„,,)- 

+ iJn + 7(7 + l)^n 



18 JEAN-FRANgOIS DELMAS, JEAN-STEPHANE DHERSIN, AND ARNO SIRI-JEGOUSSE 



with An k = — T~i ~ 



n — 

[nt\A{Tn~l) 

In= ^ \ n 



k 

k=Q ^ ' 
k=l ^ i=l ' 



LntjA(T„-l) 

Notice that a.s. An^k < 1; so that Rn is weh defined. 

Convergence of /„. We first give an expansion of /„ by considering /„ = n'^~°' In,il{nt<T„} + 
In^{nt>Tn} with In.i = — J • Standard computation yields 

In,l = v{t) + - h3{n,t), 
n 

where sup |/i3(n,t)| < cxd. By decomposing according to {nt < t„} and {nt > t„}, 

te(0,7-£),n>l 

we deduce that, 

P(^n~i+"-i/" |l„-n2-°7;(t)| >e^ < P(n-^/"|/i3(n, t)| > e/2) + P(nt > t„). 

According to Lemma |3.3[ T„/n converges in probabihty to 7 > t. This imphes that 

(37) lim Fint > Tn) = 0. 

n— +00 

As n^^/"|/i3(n, t)\ < £ for n large enough, we deduce the following convergence in probability: 

(38) U _ n^-^v{t)) -^-^ 0. 
Convergence of J„. To get the convergence of J„, notice that 

\nt\A{Tn-l) L"<jA(rn-l) , ^\-" 

(39) Jn= i^t^--) E =nl-"J„,ll|„i<.„| + J„l|„i>.„|, 

i=l k=i ^ '^^ 

LntjA(r„-l) [nij 



with Jn,i = fn{i){X^'^^ ) and /„(r) = - ( 1 ) . The functions /„ 

are finite and uniformly bounded as for n > 2/ 62, 



(i-A) 






V ^7/ 





1 — — 1 ds < 00. 



Notice that 

k=i ^-^^ 



(1 - ^)-" 
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We deduce from Proposition 13.41 that (n~ ^J^ i^n > 2) converges in distribution to 
For e' > 0, we have P(l|„i>T-„}.| J„| > e') < P(nt > r„). Then we use (p9]) and ([37|) to conclude 
that the following convergence in distribution holds: 



(40) 



n 



Convergence of i?„. We shall now prove that n^i+'^-V^i?^ 

converges to in probability. 

Let e G (Oi7)- We have Rn = Rn,i + -^^,2, with 



lnt\ 

Rn,l = ^ 
fc=l 



I l{fe<r„}-r?n,l,fc, 



7 



- l{A„,fc<l-e} 



{^n,k-t){l-t)-^-''dt, 



\nt\ 



Rn,2 - ^ 



k=l 



n 



7 



l{fc<rn}l{A„,fc>l-e} 



(A„,fc-t)(l-t)-^-2dt. 



We have for k < n(7 — 82), 



n\Rn,l,k\]<cE[{An,kf] <-^E 



Recall y = {yk,k > 0) is the filtration generated by Y. We consider the 3^-martingale 
Nr = Ej=i AiVr, with AiV^ = X^^^ - E[X^^|X--i]. We have 



E 



< 2E [N^] + 2E 



Notice that 



E [7V|] = E 



,i=l 



< E 



.4 = 1 



< E 



.i=l 



(n) (y-i) 

Using that, conditionally on yi-i, and X| ' have the same distribution, we get that 



E[N!]<j2n{x[^Y 



Thanks to (fTHj) and we deduce that 



n .9 
J 
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Using p5|) and p3|) . we get 

k 

E ' ' ' 



< E 



< E 



.4 = 1 

' k 



1 

7 




where for the last inequahty we used ([HI) with eo > small enough (such that l+2eo < a) and 
the fact that ^ > 1 — 1/a implies 2 — 2^ < 3 — a as a G (1,2). This implies that E[|i?„^i^fc|] < 
c n^~°' and therefore E[|i?„^i|] < c n^~^". In particular, we get that (n^^+"~^/"ii„^i, n > 1) 
converges in probability to since —1 + a — 1/a + 3 — 2a = —{a — < for a > 1. 

We now consider Rn,2- Suppose that k < [nt\ — 1 satisfies A„^fc > 1 — e on {nt < t„}. 
Then on {nt < t^}, we have 



A -A I 7 7 ^ A I 7 ^ A 

^n.fe+l — ^n,fc H , , ^ / ^ ^n,fc H ^ 



n-{k + l)/7 



n- (A: + l)/7 



(n) 

where we used that 7 > e for the first inequality and Xj^^-^ > 1 for the last. In particular, on 
{nt < Tn}, if A„ > 1 — e for some k < [nt\ , then we have A„ j^^^j > 1 — e. This implies that 
l{nt<r„}-Rn,2 = ^{A„^i„t^>i-e}'^{nt<T„}Rn,2- With the notations of Corollary [331 we have 



{nt < r„} n {A„,L„tj > 1 - e} C {f/^^ > (1 - e)(n 
and then for any e' > 



M 

7 



C {n-i+^/"y/"^ > c}, 



= ^(l{A„.L„,j>l-e}n-'+"-'/"|i?n,2| > s' , nt < r„ 

< P(A„,LntJ >!-£,, nt<Tn) 

< p(n-i+i/"y/") > c). 



Use the convergence of V^'^'^ , see Corollary 13. 5[ to get that the right-hand side of the last 
inequality converges to as n goes to infinity. Then notice that P(n"^+""^/"|i?„,2| > e', rit> 
Tn) < P(nt > Tn) which converges to thanks to ()37p . 
Thus the following convergence in probability holds: 

(41) n-i+°-i/"i?„ -^-^ 0. 



We deduce from ([Ml), ([38l), ([40l) and ([4T1) that 



(42) n-i+"-i/"fL;")-n2-"z;(t)') 7 



dr 



l/a 



dr 



[l _ -)-^ds 
7 



To conclude, use ([29l) to get that 7 

7 / / (1 )~°^ds which in turn is equal to / dr {1 )~°'Vr. 

Jo Jr 1 Jo 1 



1 - -)~''ds 
7 
1/0 

Vi is distributed as 



ASYMPTOTIC RESULTS ON THE LENGTH OF COALESCENT TREES 



21 



□ 

6. Proof of the main result 

Let ao = — Notice that for a G {l,aQ), we have —1 + a — 1/a < 0, whereas for 
a > «0) — 1 + a — 1/a > 0. Recall 7 = a — 1. We define a(t) for t G [0,7] by 

v{t) /■* / r\~''' 

= Cor(2-a) ' "^^^^ = X r " 7 J 

We also set = — — / (1 - dr for t G (0,7). 

Cor(2 - a) 7o 7 

Theorem 6.1. M^e assume that p{t) = Cot-'^ + 0(r"+^) for some Co > and C > 1 — 1/a. 

Then for all t G (0,7), we have that 

(1) The following convergence in probability holds: 

(43) n-2+"L(") a{t). 

n— >oo 

(2) If a G (l,ao)) the following convergence in distribution holds: 

(44) f _ a(tW-'') V;. 

(3) If a €z [aQ,2), the following convergence in probability holds: If e > 0, 

(45) n-^fLl")-a(t)n2-) 0. 



Proof. First of all, let us consider the case a G (1, ao)- Lemma [4.1l and Tchebychev inequality 
imply that for a G (l,ao), we have the following convergence in probability 

hm n-i+°-V°|LS")-Zl")| =0. 

n^oo 

This and Lemma 14.21 implv that for a G (1, ao), we have the following convergence in proba- 
bility 

' * Cor(2-a)' 
The result is then a direct consequence of Theorem 15.11 

For a G [ao, 2), note that a > 3/2 and — 1 + a — l/a>0. AsC>l — 1/a and a > ao i.e. 
1 — 1/a > 2 — a, we get C > 2 — a. We then use Lemma [4. H Lemma 14.21 (only with C > 2 — a) 
and Theorem O to get (|45]), and then ([43]). □ 

Let K^""^ be the number of mutations up to the [ntj-th coalescence, for t G (0,7). con- 
ditionally on K^"'^ is a Poisson r.v. with parameter 6l[^\ The next Corollary is a 
consequence of Theorem 16.11 

Corollary 6.2. We assume that p{t) = Cot'"' + 0{t-"+'^) for some Co > and ( > 1-1/a. 
Let t G (0,7) and G be a standard Gaussian r.v., independent ofV. 

(1) For a G (l,\/2), we have 
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(2) For a G (\/2, 2), we have 

(3) For a = \/2, wg have — 1 + a — ^ = 1 — ^ and 

Proof. Let us compute the characteristic function ipn{u, v) of the 2-dimensional r.v. {Gn, Hn) 
with 

Using that, conditionahy on L^^\ the law of i^T^"^ is a Poisson distribution with parameter 
6Lf^\ we have 

^„(u,v) =E [e^"^"e™^"] =E 

We first consider the case a G (1, ao). Using Theorem 16. 11 we get that 

(l - e»/Ve-W-'-" +in/x/^a(i)n2-") 

tends to —u'^/2 in probabihty and has a non-negative real part. Hence, applying Theorem 16. II 
again, we get that {Gn,Hn) converges in distribution to (G, VJ*), where G is a standard 
Gaussian r.v. independent of V. Notice that 

We have \/2 < oq. To conclude when a < aoi use that 1 — a + l/ct is smaller (resp. equal 
to) 1 — a/2 if and only if a > y/2 (resp. a = \/2). 
Now we consider a G [ao,2). We write 

^-i+a/2(^(n) _ ^^(i)„2-a) ^ + n-^+^/^lt"^ - a{t)n'-"). 

Using Theorem 16.11 we still get that G„ converges in law to G. Moreover, (|45p implies that 
n~^+"/2(Lj"^ — a{t)n?~'^) converges to in probability. This gives the result. □ 
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